Retrieving Customary Web Language to Assist Writers

نویسندگان

  • Benno Stein
  • Martin Potthast
  • Martin Trenkmann
چکیده

This paper introduces NETSPEAK, a Web service which assists writers in finding adequate expressions. To provide statistically relevant suggestions, the service indexes more than 1.8 billion n-grams, n ≤ 5, along with their occurrence frequencies on the Web. If in doubt about a wording, a user can specify a query that has wildcards inserted at those positions where she feels uncertain. Queries define patterns for which a ranked list of matching n-grams along with usage examples are retrieved. The ranking reflects the occurrence frequencies of the n-grams and informs about both absolute and relative usage. Given this choice of customary wordings, one can easily select the most appropriate. Especially second-language speakers can learn about style conventions and language usage. To guarantee response times within milliseconds we have developed an index that considers occurrence probabilities, allowing for a biased sampling during retrieval. Our analysis shows that the extreme speedup obtained with this strategy (factor 68) comes without significant loss in retrieval quality. C. Gurrin et al (Eds.): Advances in Information Retrieval Proceedings of the 32nd European Conference on Information Retrieval, ECIR 2010 Milton Keynes, UK, pp. 631-635, ©Springer 2010.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proceedings of the seventh Web as Corpus Workshop ( WAC 7 )

We will discuss backgrounds, technology, and applications developed in the Webis Research Group, whereas the talk’s common thread is the exploitation of the web as a corpus. Three different applications will reveal different rationales and possibilities when operationalizing text reuse and language reuse on a large scale. 1. The Netspeak word search engine reuses the web as a corpus of writing ...

متن کامل

Impact of Controlled and Free Language Use in Retrieving Articles from the ProQuest and Science Direct Databases

Abstract Introduction: The growth and expansion of the Internet has changed the way information is accessed and many facilities have been created on the Web to facilitate and expedite information locating. Objective: To identify the impact of keyword documentation using the medical thesaurus on the retrieval of articles from Proquest and Science Direct databases. Materials and Methods:The pr...

متن کامل

E-Tools to Assist EFL Learners' Writing Skill: Wikis, Weblogs, and Podcasts

One of the promises of web-based education is to help students take control of their learning pace as the basic requirement of language learning is being life-long. The purpose of the present study was to find out which of the e-tools -- weblogs, wikis, or podcasts -- can better help EFL learners excel in their writing skill. To this end, 156 Iranian sophomore students majoring in English and s...

متن کامل

Impact of Online Setting Collaboration through Strategy-Based Instruction on EFL Learners’ Self-efficacy and Oral Skills

This study aimed to investigate the impact of web-based cooperative teaching through strategy-based instruction on EFL learners’ speaking and listening skills. Moreover, the use of cooperative teaching was hypothesized to have impact on the EFL learners’ self-efficacy. To this purpose, the study followed a mixed-methods design by implementing both qualitative and quantitative data gathering pro...

متن کامل

QUT_Para at TREC 2012 Web Track: Word Associations for Retrieving Web Documents

Many existing information retrieval models do not explicitly take into account information about word associations. Our approach makes use of first and second order relationships found in natural language, known as syntagmatic and paradigmatic associations, respectively. This is achieved by using a formal model of word meaning within the query expansion process. On ad hoc retrieval, our approac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010